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P-VALUES  FOR  MULTI-STAGE  AND  SEQUENTIAL  TESTS 

by 

Richard  W.  Madsen  and  Kenneth  B.  Fairbanks 
University  of  Missouri-Columbia  and  Murray  State  University 

Summary 

P-values  are  commonly  given  for  ordinary  single  stage 
statistical  tests.  In  this  note  we  give  a  general  method  for 
calculating  p-values  for  a  large  class  of  multi-stage  and 
sequential  tests.  We  also  give  some  tables  of  p-values  for 
multi-stage  tests  about  the  parameter  of  an  exponential  dis¬ 
tribution  when  test  plans  from  MIL-STD-781C  are  used. 

Key  words:  P-values,  multi-stage  tests,  sequential  tests, 
exponential  distribution. 

1.  INTRODUCTION 

It  is  quite  common  for  investigators  to  report  the  re¬ 
sults  of  a  statistical  test  by  giving  a  p-value  rather  than 
simply  stating  that  the  test  was  (or  was  not)  significant 
using  an  a-level  test.  However  when  the  statistical  test 
used  is  a  multi-stage  test  or  a  sequential  test  rather  than 
a  single  stage  test,  p-values  are  generally  not  given.  It 
is  the  purpose  of  this  note  to  give  a  general  method  for  cal¬ 
culating  p-values  for  a  large  class  of  multi-stage  and  se- 
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quential  tests.  We  also  give  some  tables  of  p-values  for 
multi-stage  tests  about  an  exponential  parameter  using  test 
plans  from  MIL-STD-781C . 

2.  DEFINITION  OF  P-VALUES 

Say  that  X  is  a  random  variable  having  distribution 
function  F(x;0).  Let  Hq  denote  some  statistical  hypothe¬ 
sis  about  F,  perhaps  a  hypothesis  about  0  such  as: 

Hq:  0  e  0q.  In  the  single  sample  case  a  random  sample 

is  typically  chosen  from  the  distribution  of 
X  and  a  test  statistic  T  is  calculated.  A  critical  region 

C  is  chosen  so  that 
a 

sup  P[T  e  C  ]  =  a  . 

8e  °o 

Generally  Ca  will  consist  of  the  extreme  values  of  T,  per¬ 
haps 

ca  !lt:U  *.>• 

In  this  case  we  would  have  C  c  C  if  a,  <  a». 

(*2  12 

It  is  at  this  point  that  the  concept  of  a  p-value  may  be 
introduced.  (Note  that  some  authors  use  the  term  prob- 
value  while  others  use  the  term  significance  probability 
instead  of  p-value.)  Dudewicz  (1976,  p.313)  defines  it 
as..."the  smallest  a  for  which  we  would  surely  reject  if 
we  observed"  T  =  t.  Bickel  and  Doksum  (1977  ,  p.170)  and 
Bhattacharyya  and  Johnson  (1977,  p.175) ,  to  name  just  two 


others,  give  similar  definitions.  However  if  we  try  to  use 
this  same  definition  for  sequential  tests  we  run  into  a 
problem,  as  we  shall  see. 

If  we  consider  a  sequential  probability  ratio  test 
(SPRT)  of  a  simple  null  against  a  simple  alternative  hy¬ 
pothesis  (Wald  (1947)  or  Ghosh  (1970))  and  if  denotes 
the  value  of  the  test  statistic  at  stage  n,  then  the  deci¬ 
sion  boundaries,  a  and  b,  are  determined  by  the  desired 
values  of  a  and  $ .  The  general  procedure  is  to  observe  the 
values  of  Z^  sequentially  and  to 

accept  II  if  Z  <  b 
r  o  n 

reject  H  if  Z  >  a 
J  o  n  — 

continue  by  observing  the  next  value  zn+1  otherwise. 

( See  Figure  1 . )  The  values  of  a  and  b  can  be  found  approx¬ 
imately  by  taking 


Figure  1.  Graphical  Representation  of  a  Sequential  Test. 
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Given  a  sample  path  as  shown  in  Figure  1  it  becomes  obvi¬ 
ous  that  there  are  difficulties  in  trying  to  extend  the 
definition  of  p-values  to  a  sequential  test.  For  one  tiling 
in  order  to  find  "the  smallest  a  for  which  we  would  surely 
reject"  the  null  hypotheses,  we  would  have  to  use  Equations 
Cl)  changing  both  a  and  b.  We  would  then  also  have  to  know 
the  entire  sample  path  and  not  just  the  value  of  the  test 
statistic  at  the  time  at  which  a  decision  is  made. 

If  we  now  return  to  the  simpler  single  sample  test  we 
can  find  an  alternative  characterization  of  p-values  which 
lends  itself  more  easily  to  generalization.  Specifically, 
"using  the  distribution  of  T  under  Hq,  calculate  the  proba¬ 
bility  P*  [the  significance  probability  or  p-value]  of  the 
occurrence  of  the  observed  value  or  more  extreme  values" 
(Bhattacharyya  and  Johnson,  1977,  p.180).  In  order  to  do 
this  we  must  determine  which  values  should  be  considered 
more  extreme  than  the  observed  value .  This  determination 
is  generally  not  difficult  in  single  sample  tests  but  is 
more  difficult  for  sequential  tests  . 

Assume  that  test  boundaries  a  ,  b  have  been  given 

n 7  n 

such  that  for  test  statistic  Z  ,  we 

n’ 

accept  H  if  Z  <  b 
r  o  n  n 

reject  II  if  Z  >  a 
J  o  n  —  n 

continue  by  observing  the  next  value  Zn  +  ^ 


i 


m 


otherwise 
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Note  that  by  setting  wo  can  obtain  a  truncated 

sequential  test  or, equivalently ,  an  N-stage  (multi-stage) 
test.  If  b^  =  a^  we  obtain  an  ordinary  single  stage  test. 
(In  some  cases  the  directions  of  the  inequalities  for 
acceptance  and  rejection  will  have  to  be  reversed.  This 
causes  no  real  problem,  however.)  Our  convention  for 
determining  which  values  should  be  considered  more  extreme 
than  the  observed  values  will  be  as  follows: 


(1) 

(2) 


(3) 

(4) 


A  decision  to  reject  at  stage  n  is  more  ex¬ 
treme  than  one  to  reject  at  stage  n  +  1. 

A  reject  decision  at  stage  n  with  observed 

value  z^  is  more  extreme  than  a  reject  deci- 
n  J 

sion  at  stage  n  with  observed  value  7/  if 

n 

z.  >  z'. 
n  n 

A  decision  to  accept  at  stage  n  is  more  ex¬ 
treme  than  one  to  accept  at  stage  n-1. 

An  accept  decision  at  stage  n  with  observed 
value  zn  is  more  extreme  than  an  accept  deci¬ 
sion  at  stage  n  with  observed  value  z'  if 

n 

z  <  z' . 


n  n 


In  Figure  2  we  show  "decision  points"  d^,d2>...,dg  which 
are  possible  final  observed  values  of  a  test  statistic. 
These  points  are  "ordered"  in  the  sense  that  d^  is  "more 
extreme"  than  d2  which  is  "more  extreme"  than  d^»  and  so 
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on.  With  this  convention  of  determining  extremeness  of 
the  final  values  of  the  test  statistic  we  can  find  p-values. 


Figure  2.  Ordered  Decision  Points 


Definition.  For  a  sequential  test,  truncated  or  not, 
with  given  test  boundaries  an,  b^  ,  if  the  test  terminates 
at  stage  k  with  observed  test  statistic  z^,  then  the  p-value 
is  defined  by 

p-value  =  P[a  test  statistic  as  or  more  extreme  than 

z.  will  be  observed  when  H  is  true]. 
a.  o 

H0  is  composite  and  not  simple,  then  the  maximum  proba¬ 
bility  found  when  Hq  is  true  will  be  the  p-value. 

For  simplicity  we  will  assume  that  Hq  is  a  simple 
hypothesis  so  that  we  will  not  have  to  be  concerned  with 
finding  the  maximum  probability  under  H  .  Notationally  we 
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can  define 


a. 

1 


(2) 


PCreject  II  at  stage  i | Hq  true] 

PtContinuation  at  stages  1,2,...  i-l  and 

Z.  >  a. |H  ] . 

1  —  1  o 

P[(b1<Z1<a,  ) , . . . (b  .  .  <  Z.  ,  <  a. 

1—  1  1  l-l  —  l-l  l-l  ’ 

(Z.  >  a . ) J H  ]. 

1  —  1*0 


and  for  z.  >  a.  define 
10  -  1 

(3)  p*  =  P[(b,  <  Z,  <  a,  )  ,  .  .  .(b  .  ,  <  Z.  ,<a.  ,  (Z.>z.  )|H  ] 

The  p-value  for  a  reject  decision  at  stage  i  with  final 

observed  value  z.  can  then  be  found  from 
10 

i-l 

p-value  =  E  a.  +  p. 

:=i  3  1 

(Note  that  the  overall  level  of  significance  of  the  test 
will  be  given  by  a  =  Ecu  with  the  sum  taken  over  all  pos¬ 
sible  test  stages .  Also  if  a  test  is  curtailed  with  re¬ 
jection  at  the  itb  stage,  the  p-value  will  be  bounded  since 

•  *i  • 

i-l  i 

E  a.  <  p-value  <  E  a..) 

1  3  ~  i  3 

In  a  similar  way  we  define 

CO  y  =  pf  (b  <  Z  <  a  )  , .  .  .(b  .  <  z.  <  a.  , )  , (  Z  .  <  b  .  )  I II  1 

l  1—1  1  i-l—  i-l  i-l  1  i  1  o 

and  for  z.  <  b. 

10  i 

(5)  qi!P[(b1<Z1<a1)1...  »(bi_l  i  Zi-i  <  ai-i)’  (Zi  <  zio>  lHo] 
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then  the  p-value  for  an  accept  decision  at  stage  i  with 

final  observed  value  z.  is 

10 

/i-1  v 

p-value  =  1  “  (  ^  Yj  +  q*  )  • 

Here  too,  if  the  test  is  curtailed  at  an  acceptance  boun¬ 
dary  bounds  for  the  p-value  can  be  given. 

If  the  p-value  is  defined  in  this  way,  then  when  Hq 
is  true  the  distribution  of  the  p-value  will  be  uniform 
over  the  interval  [0,  lJ.  This  same  property,  of  course, 
holds  in  the  single  sample  testing  situation. 

3.  APPLICATION  TO  MIL-STD-781C  TEST  DESIGNS 

While  conceptually  it  is  quite  straightforward  to  use 
Equations  (2)  -  (5)  to  find  p-values,  the  actual  calcula¬ 
tions  will  typically  involve  numerical  integration  to  find 
the  ou  ,  ,  etc.  We  will  illustrate  the  method  of  finding 

p-values  by  considering  just  three  of  the  test  plans  given 
in  MIL-STD-781C  (1977)  where  the  underlying  random  variable 
of  interest  has  an  exponential  distribution.  Bryant  and 
Schmee  (1979)  considered  the  problem  of  finding  confidence 
intervals  for  the  parameter  0  of  the  exponential  distribu¬ 
tion  when  using  these  test  plans.  Although  the  method  is 
applicable  to  all  test  plans ,  we  will  only  consider  test 
plans  IVC,  VIC,  and  VIIC.  We  will  begin  with  Plan  VIC. 

Here  the  discrimination  ratio  is  3,  so  we  may  consider  the 
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test  of 


H  :  6  =  9  =3  vs  H, :  0  =  0.  =  1 

o  o  11 

with  a  =  0  =  .20  (nominal  values).  If  X^  ,  , .  .  .  are  inde¬ 

pendent  exponential  random  variables  with  parameter  0,  then 

we  will  use  as  test  statistic  Z  =  X, +--*+X  .  The  decision 

n  1  n 

boundaries  a  and  b  are  shown  in  Table  1.  Note  that  be- 
n  n 

cause  of  the  relative  magnitudes  of  0q  and  0^  the  accept- 
reject  regions  will  be  interchanged,  i.e.  here  will  be 
accepted  if  Z  >  b  and  rejected  if  Z  <  a  .  The  necessary 
modifications  in  Equations  (2)  to  (4)  are  easy  to  make. 


(Reject  boundary) 


1  0 

2  0.36 

3  4.50 


(accept  boundary) 
b.. 


l 


2.67 

4.32 

4.50 


Table  1.  Decision  Boundaries  for  Test  Plan  VIC. 


By  using  numerical  integration  we  were  able  to  find 
the  values  of  cu  and  as  well  as  p-values  for  various 
terminal  values  in  the  rejection  region.  Since  the  test 
plans  call  for  curtailment  of  the  tests  when  an  acceptance 
boundary  is  reached  it  is  only  possible  to  give  bounds  for 
the  p-valuc  at  acceptance  but  not  the  actual  p-value .  The 
results  for  Test  Plan  VIC  are  shown  in  Table  2 . 
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Test  Plan  IVC  has  nominal  values  of  a  =  3  =  .2  with 
a  discrimination  ratio  of  2  while  Test  Plan  VIIC  has  nomi¬ 
nal  values  of  a  =  3  =  .30  with  discrimination  ratio  1.5. 

The  decision  boundaries  for  these  test  plans  are  shown  in 
Table  3.  The  p-valuos  for  these  test  plans  are  given  in 
Tables  4  and  5. 

4 .  AN  EXAMPLE 

The  examples  we  give  here  follow  the  examples  given 
by  Bryant  and  Schmee  (1979).  Specifically,  Neathammer, 
Pabst,  and  Wigginton  (1965)  describe  a  production  relia¬ 
bility  acceptance  test  of  a  black  box  term  for  an  air¬ 
craft.  In  this  problem  the  risks  are  to  be  a  =  $  =  .2 
and  the  discrimination  ratio  d  =  2.  Consequently  test 
plan  IVC  would  be  appropriate. 

Now  assume  that  in  an  actual  test  the  failures  oc¬ 
curred  at  (scaled)  accumulated  test  times  of  1.0,  1.8, 

2.4,  5.0,  and  7.8  hours.  By  looking  at  the  test  bounda¬ 
ries  shown  in  Table  3  wc  see  that  the  test  should  be  con¬ 
tinued  at  each  of  the  first  five  stages.  Assume  that  the 
sixth  failure  does  not  occur  prior  to  the  accumulated  time, 
of  9.74  hours.  Then  the  test  will  be  curtailed  with  ac¬ 
ceptance  at  this  time.  Without  knowing  the  actual  time 
of  the  sixth  failure  it  is  not  possible  to  give  the  p-value 
exactly,  but  lower  .me)  upper  bounds  on  the  p-value  can  be 
found  from  Table  4b,  Li  nee  the  decision  occurs  at  the 
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sixth  stage  we  find  the  bounds  to  be: 

lower  bound  =  .29  9  p-value  £  .3  34  =  upper  bound 

Next  we  will  consider  a  test  which  ends  in  rejection 
rather  than  acceptance.  If  in  an  actual  test  the  failures 
occured  at  accumulated  test  times  of  1.0,  1.8,  2.4,  and 
3.0  hours,  then  the  test  would  result  in  rejection  at  the 
fourth  stage.  The  p-value  can  be  found  from  Table  4a. 

At  stage  4  with  a  final  observed  value  of  3.00,  the  p-value 
can  be  seen  to  be  .127. 
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(a)  P-Values  at  Rejection 


Stage 

1 

2 

Observed 

z 

0 

.10 

.20 

• 

30 

.36 

P- value 

0 

.001 

.002 

• 

005 

.007 

Stage 

3 

Observed 

z 

.40 

.60 

.  80 

1.00 

1.20 

1.40 

1.60  1.80 

P- value 

.007 

.007 

.008 

.010 

.013 

.017 

.021  .027 

Stage 

3 

Observed 

z 

2.00 

2.20 

2.40 

2.60 

2.80 

3.00 

3.20  3.40 

P-value 

.034 

.042 

.051 

.061 

.071 

.083 

.095  .108 

Stage 

3 

Observed 

z 

3 

.60  3 

.80 

4.00 

4.20 

4  .40 

4.50 

P-value 

•  . 

121 

135 

.148 

.162 

.176 

.182 

(b) 

P-values  at 

Acceptance 

Stage 

1 

2 

3 

Lower  bound 

.589 

.378 

.182 

Upper  bound 

1.000 

.589 

.378 

Table  2.  P-values  for  Test  Plan  VIC 


it 


Plan  IV  C 


Plan  VII C 


(a)  P-Values  at  Rejection 


Stage 

1 

2 

Observed 

z 

Q 

.20 

.40 

.60 

.70 

P-value 

0 

•  Q05 

.018 

.037 

.049 

Stage 

3 

Observed 

z 

.  80 

1.00 

1.20 

1.40 

1.60 

1.80 

2.00 

2.08 

P-value 

.049 

.052 

.057 

.065 

.075 

.088 

.103 

.109 

Stage 

4 

Observed 

7. 

2.20 

2.40 

2  .60 

2.80 

3.00 

3.20 

3.40 

3.46 

P-value 

.110 

.112 

.115 

.120 

.127 

.136 

.146 

.149 

Stage 

5 

Observed 

z 

3.60 

3.80 

4  .00 

4.20 

4.40 

4.60 

4.80 

4.86 

P-value 

.149 

.151 

.153 

.157 

.162 

.167 

.174 

.176 

Stage 

6 

Observed 

z 

5  .00 

5.20 

5.40 

5.60 

5.80 

6  .00 

6.20 

6.24 

P-value 

.176 

.177 

.179 

.181 

.185 

.188 

.193 

.194 

Stage 

7 

Observed 

z 

6.40 

6.60 

6.80 

7.00 

7.20 

7.40 

7.60 

7.62 

P-value 

.19  4 

.195 

.196 

.198 

.200 

.202 

.205 

.206 

Stage 

8 

Observed 

z 

8.00 

8.40 

8.80 

9  .00 

9  .20 

9.40 

9  .60 

9.74 

P-value 

.206 

.208 

.211 

.213 

.216 

.218 

.221 

.22  3 

Table  4(a)  P-values  at  Rejection  for  Test  Plan  IV C 


Cb)  P-values  at  Acceptance 


Stage  _1_  _2_  _3_  _4_  _5_  _6_  7 

Lower  bound  .753  .580  .464  .386  .334  .299  .253 

Upper  bound  1.000  .753  .580  .464  .386  .334  .299 


Table  4(b)  P-values  at  Acceptance  for  Test  Plan  IV  C 


Stage 

Observed 

P-value 

Stage 

(a) 

1_  _2 

0  0 

0  0 

P-Values  at 

Rejection 

3 

•20  .40  .60  .80  1.00  1 
.000+  .003  .008  .017  .030  .1 

4 

Observed  z 

1.40 

1.60 

1 . 80 

2.Q0  2.20 

2  .40 

2.43 

P-value 

.050 

.054 

.061 

.070  .082 

.097 

.099 

Stage 

5 

Observed  z 

2.60 

2  .80 

3  .00 

3.20  3.40 

3.60 

3.65 

P-value 

.100 

.103 

.108 

.116  .125 

.136 

.140 

S  tago 

6 

Observed  z 

4.00 

4.20 

4.40 

4.60  4.80 

5.00 

5.20 

P-value 

.142 

.146 

.152 

.159  .168 

.178 

.190 

Stage 

7 

Observed  z 

5  .40 

5.60 

5.80 

6.00  6.20 

6  .40 

6.60 

P-value 

.203 

.217 

.233 

.249  .266 

.283 

.301 

Table  5(a) 

P-values  at 

Rejection  for  Test 

Plan 

VII  C 

(b) 

P-values  at 

Acceptance 

S  tage 

1 

2 

3  4 

5 

6 

Lower  bound 

.878 

.764  .669  .593 

.455 

.319 

Upper  bound 

1.000 

.878  .764  .669 

.593 

.455 

Table  5(b)  P-values  at  Acceptance  for  Test  Plan  VII C 
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